Similarity in Computational Sciences

نویسندگان

  • Tina Eliassi-Rad
  • Terence Critchlow
چکیده

The advent of fast computer systems has enabled scientists to visualize and analyze complex phenomena (such as explosions of stars and expressions of genes) [2][3][7][8][9][10][12][13]. These complex phenomena (whether simulated or observed) generate large-scale data sets. For instance, simulations of supernovae easily produce terabytes of data [1]. Given such massive amounts of data, it is not surprising that clustering algorithms are quite popular in computational sciences. In particular, non-projected clustering algorithms (such as k-means, k-mediods, hierarchical, and smooth clustering) are widely used [4][5][6]. A crucial input to any of these clustering algorithms is the similarity function that assigns data objects to specific groups. Depending on the purpose of clustering and the characteristics of data objects, the task of selecting an appropriate similarity function can be nontrivial. In computational sciences, data objects are represented by n-dimensional vectors in space and time [10][11]. In particular, each element of an ndimensional data object can be either a scalar quantity (such as density) or a vector quantity (such as velocity). Due to the presence of both scalar and vector quantities within the data objects, it is important that a similarity function encodes both magnitude and direction. For instance, the popular Euclidean distance can only measure dissimilarities in scalar quantities and magnitude of vector quantities. That is, two vectors β α r r and are considered identical when β α β α r r r r ∠ ≠ ∠ ≡ and ! In contrast, the Pearson’s correlation coefficient can only measure similarities in directions of vector quantities. That is, two vectors β α r r and are considered identical when β α β α r r r r ∠ ≡ ∠ ≠ and ! In both of these examples, a similarity function that captures both magnitude and direction will not consider β α r r and as identical!

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A computational method to analyze the similarity of biological sequences under uncertainty

In this paper, we propose a new method to analyze the difference and similarity of biological sequences, based on the fuzzy sets theory. Considering the sequence order and some chemical and structural properties, we present a computational method to cluster the biological sequences. By some examples, we show that the new method is relatively easy and we are able to compare the sequences of arbi...

متن کامل

INTERVAL-VALUED INTUITIONISTIC FUZZY SETS AND SIMILARITY MEASURE

In this paper, the problem of measuring the degree of inclusion and similarity measure for two   interval-valued intuitionistic  fuzzy sets is considered. We propose inclusion and similarity measure by using  order on interval-valued intuitionistic fuzzy sets connected with lexicographical order. Moreover, some properties of inclusion and similarity measure and some correlation, between them an...

متن کامل

DUNEDIN NEW ZEALAND Investigating Complexities Through Computational Techniques

This article outlines similarity applied to the general environment and geographical information domains. The hypothesis is if physical and social sciences manifest similar amenities, then similarity would be a generative technique to analyse the cached information inherent in the data retrieved. Similarity is examined concerning the spatial grouping of natural kinds in a complex environment.

متن کامل

A Novel Method for Tracking Moving Objects using Block-Based Similarity

Extracting and tracking active objects are two major issues in surveillance and monitoring applications such as nuclear reactors, mine security, and traffic controllers. In this paper, a block-based similarity algorithm is proposed in order to detect and track objects in the successive frames. We define similarity and cost functions based on the features of the blocks, leading to less computati...

متن کامل

Genetic diversity study of Ethiopian hot pepper cultivars (Capsicum spp.) using Inter Simple Sequence Repeat (ISSR) marker

Hot pepper (Capsicum spp.) is an economically important spice widely cultivated and consumed in Ethiopia. In spite of its wide importance, there is no information available on the molecular genetic diversity of this crop. Cultivars characterization is an important link between the conservation and utilization of plant genetic resources in various breeding programs. Using five ISSR prim...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005